26 research outputs found
Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation
The discovery of novel materials and functional molecules can help to solve
some of society's most urgent challenges, ranging from efficient energy
harvesting and storage to uncovering novel pharmaceutical drug candidates.
Traditionally matter engineering -- generally denoted as inverse design -- was
based massively on human intuition and high-throughput virtual screening. The
last few years have seen the emergence of significant interest in
computer-inspired designs based on evolutionary or deep learning methods. The
major challenge here is that the standard strings molecular representation
SMILES shows substantial weaknesses in that task because large fractions of
strings do not correspond to valid molecules. Here, we solve this problem at a
fundamental level and introduce SELFIES (SELF-referencIng Embedded Strings), a
string-based representation of molecules which is 100\% robust. Every SELFIES
string corresponds to a valid molecule, and SELFIES can represent every
molecule. SELFIES can be directly applied in arbitrary machine learning models
without the adaptation of the models; each of the generated molecule candidates
is valid. In our experiments, the model's internal memory stores two orders of
magnitude more diverse molecules than a similar test with SMILES. Furthermore,
as all molecules are valid, it allows for explanation and interpretation of the
internal working of the generative models.Comment: 6+3 pages, 6+1 figure
Bayesian optimization with known experimental and design constraints for chemistry applications
Optimization strategies driven by machine learning, such as Bayesian
optimization, are being explored across experimental sciences as an efficient
alternative to traditional design of experiment. When combined with automated
laboratory hardware and high-performance computing, these strategies enable
next-generation platforms for autonomous experimentation. However, the
practical application of these approaches is hampered by a lack of flexible
software and algorithms tailored to the unique requirements of chemical
research. One such aspect is the pervasive presence of constraints in the
experimental conditions when optimizing chemical processes or protocols, and in
the chemical space that is accessible when designing functional molecules or
materials. Although many of these constraints are known a priori, they can be
interdependent, non-linear, and result in non-compact optimization domains. In
this work, we extend our experiment planning algorithms Phoenics and Gryffin
such that they can handle arbitrary known constraints via an intuitive and
flexible interface. We benchmark these extended algorithms on continuous and
discrete test functions with a diverse set of constraints, demonstrating their
flexibility and robustness. In addition, we illustrate their practical utility
in two simulated chemical research scenarios: the optimization of the synthesis
of o-xylenyl Buckminsterfullerene adducts under constrained flow conditions,
and the design of redox active molecules for flow batteries under synthetic
accessibility constraints. The tools developed constitute a simple, yet
versatile strategy to enable model-based optimization with known experimental
constraints, contributing to its applicability as a core component of
autonomous platforms for scientific discovery.Comment: 15 pages, 5 figures (SI with 13 pages, 8 figures